Validate Checkpoint in ChangePack for PushPull API requests #959

binary-ho · 2024-08-10T17:20:47Z

1. What this PR does / why we need it:

This issue is validation of the Checkpoint when a client calls the PushPull API and sends a request ChangePack. This validation is necessary to handle potential issues such as network-induced duplicate requests, bugs in the new SDK, or malicious tampering.

Specifically, three scenarios needed to be validated:

The ClientSeqs in the reqPacks.Changes are sequential.
The ClientSeqs in the request is sequential with DocInfo.Checkpoint.ClientSeq.
The ClientSeq in the request sent to the cannot be less than the ClientSeq in the Server's ClientInfo. This addresses duplicate requests.
The ServerSeq in the request sent to the PushPull API cannot be greater than the ServerSeq maintained by the server. This addresses malicious requests.

I will refer to these three scenarios as Case 1, 2, 3, 4

Upon reviewing the code, I found that Cases 3 and 4 were already implemented.

Thus, I only needed to implement Case 1. However, I have written test codes for Cases 1, 2, 3, 4 and the happy case. I will explain the implementation for each case and the corresponding test code.

In the upcoming code blocks, the first line of each comment will indicate the file name and function name for your reference.

I also have a few questions. These questions are marked in bold and will be included in the Special notes for your reviewer section.

1.1 `Case 1`: The ClientSeqs in the request sent to the PushPull API are sequential

This validation is implemented in packs.go and occurs immediately when the PushPull function is called.

server/packs/packs.go

func validateClientSeqSequential(changes []*change.Change) error {
	if len(changes) <= 1 {
		return nil
	}

	nextClientSeq := changes[0].ClientSeq()
	for _, cn := range changes[1:] {
		nextClientSeq++

		if nextClientSeq != cn.ClientSeq() {
			return fmt.Errorf(
				"ClientSeq in Changes are not sequential (expected: %d, actual: %d) : %w",
				nextClientSeq,
				cn.ClientSeq(),
				ErrClientSeqNotSequential,
			)
		}
	}
	return nil
}

This function checks if the Changes in the request are sequential.

Q. It does not validate whether the first Change's ClientSeq differs by 1 from the ClientSeq in the existing ClientInfo. Should this be added? Logically, if there are 0 or 1 changes, there's nothing to validate, so it returns without issues. If the sequence is not sequential, it returns ErrClientSeqNotSequential, which is mapped to connect.CodeInvalidArgument in the response via errorToConnectCode.

server/packs/packs.go

func PushPull(
    ...
	// TODO: Changes may be reordered or missing during communication on the network.
	// We should check the change.pack with checkpoint to make sure the changes are in the correct order.
	err := validateClientSeqSequential(reqPack.Changes)
	if err != nil {
		return nil, err
	}
    ...

This validation is tested at the location of the above comment.

Q. Should we now remove this TODO comment?

1.2 `Case 2: The ClientSeq in changes not sequential with DocInfo.Checkpoint.ClientSeq

// packs/packs.go
func validateClientSeqSequentialWithCheckpoint(changes []*change.Change, checkpoint change.Checkpoint) error {
	expectedClientSeq := checkpoint.ClientSeq + 1
	actualFirstClientSeq := changes[0].ClientSeq()

	if expectedClientSeq < actualFirstClientSeq {
		return fmt.Errorf(
			"ClientSeq is not sequential with DocInfo.Checkpoint.ClientSeq (expected: %d, actual: %d) : %w",
			expectedClientSeq,
			actualFirstClientSeq,
			ErrClientSeqNotSequentialWithCheckpoint,
		)
	}
	return nil
}

these two case return connect.CodeInvalidArgument

// server\rpc\connecthelper\status.go

var errorToConnectCode = map[error]connect.Code{
	
       ...

	packs.ErrClientSeqNotSequentialWithCheckpoint: connect.CodeInvalidArgument,
	packs.ErrClientSeqInChangesAreNotSequential:   connect.CodeInvalidArgument,

}

...

1.3 `Case 3`: The ClientSeq in changes cannot be less than the ClientSeq in the Server's ClientInfo

This was already implemented, and it simply logs a warning. It returns OK as specified in the current issue.

pushpull.go, function pushChanges()

    ...
	cp := clientInfo.Checkpoint(docInfo.ID)

	var pushedChanges []*change.Change
	for _, cn := range reqPack.Changes {
		if cn.ID().ClientSeq() > cp.ClientSeq {
			serverSeq := docInfo.IncreaseServerSeq()
			cp = cp.NextServerSeq(serverSeq)
			cn.SetServerSeq(serverSeq)
			pushedChanges = append(pushedChanges, cn)
		} else {
			logging.From(ctx).Warnf(
				"change already pushed, clientSeq: %d, cp: %d",
				cn.ID().ClientSeq(),
				cp.ClientSeq,
			)
		}

		cp = cp.SyncClientSeq(cn.ClientSeq())
	}
    ...

1.4 `Case 4`: The ServerSeq in the request cannot be greater than the ServerSeq maintained by the server.

This was also already implemented. In the code below, initialServerSeq refers to the ServerSeq maintained in the Document Info.

Q. Currently, it returns connect.CodeFailedPrecondition instead of connect.CodeInvalidArgument as specified in the issue. Is this appropriate?

pushpull.go pullPack

    ...
    if initialServerSeq < reqPack.Checkpoint.ServerSeq {
		return nil, fmt.Errorf(
			"serverSeq of CP greater than serverSeq of clientInfo(clientInfo %d, cp %d): %w",
			initialServerSeq,
			reqPack.Checkpoint.ServerSeq,
			ErrInvalidServerSeq,
		)
	}
    ...

1.5 Test Code

The test code is located in server/packs/packs_test.go. It covers the following four scenarios:

// server/packs/packs_test.go

func Test(t *testing.T) {
	t.Run("push/pull sequential ClientSeq test (happy case)", func(t *testing.T) {
		RunPushPullWithSequentialClientSeqTest(t)
	})

	t.Run("push/pull not sequential ClientSeq test", func(t *testing.T) {
		RunPushPullWithNotSequentialClientSeqTest(t)
	})

	t.Run("push/pull ClientSeq less than ClientInfo's ClientSeq (duplicated request)", func(t *testing.T) {
		RunPushPullWithClientSeqLessThanClientInfoTest(t)
	})

	t.Run("push/pull ServerSeq greater than DocInfo's ServerSeq", func(t *Testing.T) {
		RunPushPullWithServerSeqGreaterThanDocInfoTest(t)
	})
}

happy case: Sequential ClientSeq in Client Request
Not Sequential with DocInfo.Checkpoint.ClientSeq
Not Sequential ClientSeq in Client Request Changes
duplicated request: ClientSeq in Request is less than ClientInfo's ClientSeq (But No Error)
ServerSeq in Request is greater than DocInfo's ServerSeq

2. Which issue(s) this PR fixes:

resolves #805

3. Special notes for your reviewer:

Questions 1 through 3 are addressed in the case descriptions above:

Case 1: The function validateClientSeqSequential() verifies that the Changes in the request are sequential. It does not validate whether the first Change's ClientSeq differs by 1 from the ClientSeq in the existing ClientInfo. Should this be added?
Case 1: Should we remove the TODO comment in the PushPull function in server/packs/packs.go?
Case 3: The issue specifies that malicious ServerSeqs should return connect.CodeInvalidArgument, but currently, it returns connect.CodeFailedPrecondition. Is this appropriate?
If client actions are required in response to these scenarios, we should discuss how to handle them.
- e.g., The client should be aware that Changes are not applied if the request is not sequential.

4. Does this PR introduce a user-facing change?:

5. Additional documentation:

Here, I have attached the questions I posted on the issue and the answers I found. I hope this will be helpful to those studying ClientSeq and ServerSeq. If there are any mistakes, please let me know.

I have also attached the responses from @sejongk to my questions. I learned a lot from his answers, and I am very grateful to him.

5.1 My Question And Answer

Hello,
I have a few questions as I work on understanding the content to resolve the issue.

Issue: Validate Checkpoint in ChangePack for PushPull API requests

Q1. Where do the server's `ClientSeq` and `ServerSeq` come from?

To resolve this issue, I thought it would be sufficient to compare the ClientSeq and ServerSeq of the client with those maintained by the server.

Is it correct that the server's ClientSeq and ServerSeq can be obtained from the Checkpoint of ClientInfo, which is derived from the ClientId in the Request Message?

Q2. Where do the `ClientSeq` and `ServerSeq` sent by the client come from?

The ChangePack in the Request's Msg contains a Checkpoint and an array of Changes.
The Checkpoint holds its ClientSeq and ServerSeq, and each Change in the array also has its ClientSeq and ServerSeq.
According to the issue, it mentions "so Change.ID.Checkpoint.ClientSeq should increment sequentially by one." Does this mean that the ClientSeq of the client should be the sequential values of each Change, and these should be validated against the server's ClientSeq (rather than the ClientSeq in the Checkpoint of the ChangePack)?

And regarding "Checkpoint.ServerSeq in the request ChangePack for PushPull API cannot be greater than the server's Checkpoint.ServerSeq since it is set when the server saves the Change to the database."

Should the client's ServerSeq be obtained from the ServerSeq in the Checkpoint of the ChangePack in the Msg (not the ServerSeq of each individual Change)?

Answer 1, 2

The server's ServerSeq can be found in DocInfo. Since the ServerSeq is updated every time a Change from various clients is applied, it is managed at the Document level.
The server's ClientSeq can be found in the Checkpoint of ClientInfo. Since the server needs to know the last ClientSeq sent by the client, it is managed at the ClientInfo level.
The client's ClientSeq is assigned by the client for each Change and is sequentially included in Changes. The ClientSeq can be found in both the Checkpoint and the Changes.
The client's ServerSeq can be found in reqPack.Checkpoint.ServerSeq. The client retains the last ServerSeq received during synchronization and includes it in the Checkpoint when sending a Request.

Q3. How is a "wrong" ClientSeq determined?

You mentioned that Change.ID.Checkpoint.ClientSeq should increment sequentially by one. Does this mean that the following scenarios are considered wrong?

If the ClientSeq of each Change in the ChangePack sent by the client is not sequential, the request is wrong.
If the ClientSeq in the CheckPoint sent by the client is ≤ the _ClientSeq in clientInfo, it is wrong.

Answer 3

Correct. There was an issue during the process of saving the Changes.
Correct. This corresponds to a duplicate request.

Q4. How is a duplicate request determined?

Would it be correct to consider the scenario where the ClientSeq sent by the client is equal to the _ClientSeq in clientInfo as a duplicate request due to network delay? I seem to be missing the criteria for determining a "duplicate request."

Answer 4

Answered in Answer 3.

Q5. How is a wrong ServerSeq determined?

Is it wrong when the ServerSeq sent by the client is greater than the ServerSeq obtained from ClientInfo on the server?

Answer 5

If the ServerSeq sent by the client is greater than the ServerSeq found in DocInfo on the server, it is a malicious request. This is because the ServerSeq held by the client is initially provided by the server. Since the ServerSeq increases sequentially, the client's ServerSeq should never be larger.

Q6. Where should the validation take place?

I believe it should occur in the PushPullChanges function in yorkie_server.go. Should I declare the validate function and error directly in yorkie_server.go, or should I create a validator file in the rpc package and declare the validate function and error there?

Answer 6

I was curious whether a separate object or layer was needed for validation. Based on Yorkie's existing validation methods, it seems unnecessary.

5.2 `sejongk`'s Answer

Understanding the concept of a Checkpoint seems crucial for this issue. I searched for relevant design documents and found a document on GC garbage-collection.md , though it might be good to add more detailed information about Checkpoints there or even create a separate Checkpoint document in the future.

You may already be familiar with Checkpoints, but I'll share what I reviewed today:

The Yorkie server imposes sequential consistency on the Changes generated by multiple clients. In other words, the server assigns the id of the Changes received during the push process with a global counter called server_seq, which is incremented by 1. The docInfo.SeverSeq corresponds to this global counter.
Later, during the pull process, the server sends the Changes that the client has not yet received, along with the current server_seq that has been incremented up to that point. This server_seq serves to inform the client of how far it has received the Changes.
When the client performs a pushpull, it sends the server_seq it previously received to the server in reqPack.Checkpoint.ServerSeq, notifying the server of how far it has received and prompting the server to send the Changes corresponding to the subsequent server_seq.
The server holds the client_seq on the client side as clientInfo.Checkpoint(docInfo.ID).ClientSeq, which is used to validate the client_seq of the Changes.
- ChangePack is used as a group for buffering Changes. The client_seq within the ChangePack must be sorted and incremented by 1 sequentially. This is because if a Change is lost due to network issues, it can be detected through the client_seq. This is similar to how sequence numbers are assigned to each packet in TCP.
- It also solves the problem of duplicate Changes, as the server currently checks whether the client_seq is smaller than the client_seq the server holds (if cn.ID().ClientSeq() > cp.ClientSeq).
The Checkpoint logic is crucial in GC, but since it's not the main point of this issue, I left it out. min_synced_seq is one of the core elements in GC.

After reviewing the above, it would be helpful to look at the diagrams explaining the Checkpoint flow at the following links:

You may want to review the code in these files:

server/packs/packs.go/func PushPull
server/packs/pushpull.go/func pushChanges
server/packs/pushpull.go/func pullChanges

6. Checklist:

validate ClientSeqs in the request sent to the PushPull API are sequential.
validate ClientSeq in the request sent to the PushPull API cannot be less than the ClientSeq in the Server's ClientInfo. This means duplicate requests.
validate ServerSeq in the request sent to the PushPull API cannot be greater than the ServerSeq maintained by the server. This means malicious requests.
Write test code

Summary by CodeRabbit

New Features
- Enhanced error handling for client sequence validation during document synchronization.
- Introduced new error types for non-sequential client sequences.
Bug Fixes
- Improved control flow in document processing to prevent invalid changes from being processed.
- Added specific error mappings to enhance feedback during connection issues related to client sequence validation.
Tests
- Added comprehensive unit tests for validating client sequence handling in push and pull operations.

…am#952) Added the handler to allow health checks to be performed with plain HTTP GET requests needed for traditional uptime checker or load balancer, along with existing gRPC health check.

….Changes

…, non-sequential ClientSeq, ClientSeq greater than ClientInfo's ClientSeq, and ServerSeq greater than DocInfo's ServerSeq.

CLAassistant · 2024-08-10T17:20:52Z

All committers have signed the CLA.

coderabbitai · 2024-08-10T17:20:53Z

Walkthrough

The updates improve error handling for client sequences in request packs, enhancing data integrity during synchronization. New error variables were introduced for non-sequential client IDs, and the PushPull function now includes robust validation checks. Additionally, comprehensive unit tests cover various client sequence scenarios, significantly improving the reliability and feedback mechanisms for client-server interactions.

Changes

Files	Change Summary
`server/packs/packs.go`, `server/packs/packs_test.go`	Added error variables for client sequence validation; implemented validation logic and corresponding unit tests.
`server/rpc/connecthelper/status.go`	Introduced new error mappings in connection logic to handle client sequence validation issues.

Assessment against linked issues

Objective	Addressed	Explanation
Validate `Checkpoint` in `ChangePack` for requests (#805)	✅
Ensure `ClientSeq` increments sequentially by one (#805)	✅
Handle invalid `Checkpoint` with proper exceptions (#805)	✅

Poem

In the meadow where data flows,
A rabbit hops, where the sequence goes.
With checks in place, errors in sight,
Synchronization dances, oh what a delight!
So here’s to the changes, both clever and wise,
Keeping our packs tidy, under bright skies! 🐇✨

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 80c6ea0 and b2472f4.

Files selected for processing (3)

server/packs/packs.go (4 hunks)
server/packs/packs_test.go (1 hunks)
server/rpc/connecthelper/status.go (2 hunks)

Additional comments not posted (9)

server/packs/packs.go (3)
42-43: LGTM: Error variable declaration.

The error variable ErrClientSeqNotSequential is appropriately defined for handling non-sequential client sequences.

287-306: LGTM: Client sequence validation function.

The validateClientSeqSequential function is well-implemented, ensuring client sequences are sequential and providing clear error messages.

86-89: LGTM: Integration of client sequence validation.

The PushPull function now validates client sequences, which enhances synchronization logic robustness.

Ensure that this validation does not introduce any unintended side effects.

Verification successful

Integration of validateClientSeqSequential is isolated to PushPull.

The validateClientSeqSequential function is used exclusively in the PushPull method, ensuring it does not introduce unintended side effects elsewhere in the codebase. The integration appears robust and focused on enhancing synchronization logic.

Location: server/packs/packs.go
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the usage of `validateClientSeqSequential` in the codebase.

# Test: Search for the function usage. Expect: Only occurrences of the new usage.
rg --type go -A 5 $'validateClientSeqSequential'
Length of output: 594
server/rpc/connecthelper/status.go (2)

51-51: LGTM: Error to connect code mapping.

The mapping of packs.ErrClientSeqNotSequential to connect.CodeInvalidArgument is appropriate for indicating malformed requests.

104-104: LGTM: Error to string code mapping.

The mapping of packs.ErrClientSeqNotSequential to "ErrClientSeqNotSequential" provides a clear string representation for error handling.

server/packs/packs_test.go (4)

47-76: LGTM: Test for sequential client sequences (happy case).

The test case RunPushPullWithSequentialClientSeqTest is well-structured and verifies that valid sequential client sequences are processed correctly.

78-107: LGTM: Test for non-sequential client sequences.

The test case RunPushPullWithNotSequentialClientSeqTest correctly asserts that an error is returned for non-sequential client sequences.

109-148: LGTM: Test for duplicate client sequences.

The test case RunPushPullWithClientSeqLessThanClientInfoTest ensures that duplicate client sequences do not cause errors, verifying validation robustness.

150-187: LGTM: Test for server sequence validation.

The test case RunPushPullWithServerSeqGreaterThanDocInfoTest correctly asserts an error is returned when the server sequence exceeds expectations, ensuring proper validation.

sejongk

Answers to your questions

1. Case 1: The function validateClientSeqSequential() verifies that the Changes in the request are sequential. It does not validate whether the first Change's ClientSeq differs by 1 from the ClientSeq in the existing ClientInfo. Should this be added?

I think it is necessary. The Yorkie server needs to ensure that changes are not missing and are causally ordered. Consider the following example:
If a server's ClientSeq is 1 and a client sends a change with ClientSeq 3, the server must check if a change with ClientSeq 2 is missing and reject the request, because the change with ClientSeq 3 may causally depend on the one with ClientSeq 2.

2. Case 1: Should we remove the TODO comment in the PushPull function in server/packs/packs.go?

I think so.

3. Case 3: The issue specifies that malicious ServerSeqs should return connect.CodeInvalidArgument, but currently, it returns connect.CodeFailedPrecondition. Is this appropriate?

According to the current code,

InvalidArgument means the request is malformed.
FailedPrecondition means the request is rejected because the state of the system is not the desired state.

Receiving a malicious ServerSeq seems more related to the system's state than to value formatting, so I think FailedPrecondition is more appropriate.

4. If client actions are required in response to these scenarios, we should discuss how to handle them.

I agree. We can continue this discussion in a follow-up issue.

Additional Consideration

According to the suggestion regarding exception handling, it may be necessary to skip these validation checks and return ok if the requests are duplicates.

coderabbitai

Actionable comments posted: 0

Outside diff range, codebase verification and nitpick comments (1)

server/packs/packs.go (1)
88-88: Review the necessity of the TODO comment.

The newly implemented validation functions may address the concerns mentioned in the TODO comment about change ordering.
- // TODO: Changes may be reordered or missing during communication on the network.
- // We should check the change.pack with checkpoint to make sure the changes are in the correct order.
+ // Validation ensures changes are in the correct order with respect to the checkpoint.

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between b2472f4 and 511f8f1.

Files selected for processing (2)

server/packs/packs.go (4 hunks)
server/rpc/connecthelper/status.go (2 hunks)

Files skipped from review as they are similar to previous changes (1)

server/rpc/connecthelper/status.go

Additional comments not posted (5)

server/packs/packs.go (5)

41-47: Error variable declarations look good.

The error variables improve the clarity and specificity of error handling for client sequence validation.

291-301: Function validateClientSeqSequential is well-implemented.

The function effectively orchestrates the validation of client sequences with a modular approach.

303-316: Function validateClientSeqSequentialWithCheckpoint is correctly implemented.

The function accurately checks the sequence against the checkpoint and provides a clear error message.

318-333: Function validateClientSeqInChangesAreSequential is well-implemented.

The function accurately validates the sequential order of client sequences in changes.

89-93: Integration of client sequence validation in PushPull is appropriate.

The addition of validateClientSeqSequential enhances the robustness of the synchronization logic.

…heckpoint.ClientSeq

coderabbitai

Actionable comments posted: 0

Outside diff range, codebase verification and nitpick comments (1)

server/packs/packs.go (1)
302-314: Clarify error message for sequential validation.

The error message in validateClientSeqSequentialWithCheckpoint could be more descriptive to help diagnose issues more effectively. Consider specifying the expected and actual sequences in more detail.
-	"ClientSeq is not sequential with DocInfo.Checkpoint.ClientSeq (expected: %d, actual: %d) : %w",
+	"ClientSeq mismatch: expected sequential order with Checkpoint.ClientSeq (expected: %d, found: %d) : %w",

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 511f8f1 and a32137e.

Files selected for processing (2)

server/packs/packs.go (4 hunks)
server/packs/packs_test.go (1 hunks)

Files skipped from review due to trivial changes (1)

server/packs/packs_test.go

Additional comments not posted (3)

server/packs/packs.go (3)
317-331: LGTM! Sequential validation logic is clear and correct.

The validateClientSeqInChangesAreSequential function is well-implemented. The logic is straightforward, and the error message is clear.

290-300: Verify integration and usage of validation function.

The validateClientSeqSequential function is well-structured and delegates tasks to helper functions. Verify its integration and usage within the codebase to ensure it functions as intended.

88-92: Verify error handling and control flow.

The integration of validateClientSeqSequential in the PushPull function appears correct. Ensure that the error handling and control flow align with the intended logic, particularly regarding the response codes for validation failures.

Verification successful

Error Handling in PushPull Function Verified

The error handling in the PushPull function is correctly implemented, returning nil, err when validateClientSeqSequential encounters an error. This follows standard Go practices for error handling.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the error handling and response codes in the PushPull function.

# Test: Search for error handling logic in PushPull function. Expect: Proper response codes and error messages.
rg --type go -A 5 'return nil, err' --context 10
Length of output: 225025

Script:
#!/bin/bash
# Description: Search for the PushPull function definition and its error handling in server/packs/packs.go.

# Test: Locate the PushPull function and analyze its error handling.
ast-grep --lang go --pattern $'func PushPull($_, $_, $_, $_, $_, $_, $_) ($_ , error) {
  $$$
  return nil, err
  $$$
}' server/packs/packs.go
Length of output: 143

Script:
#!/bin/bash
# Description: Locate the PushPull function and its error handling in server/packs/packs.go.

# Test: Search for the PushPull function definition and its error handling.
rg --type go -A 20 'func PushPull' server/packs/packs.go
Length of output: 1274

binary-ho · 2024-08-11T13:26:20Z

hi @sejongk here is the description of additional codes

1. add validation `Change's ClientSeq differs by 1 from the ClientSeq in the existing docInfo`

// packs/packs.go
func validateClientSeqSequentialWithCheckpoint(changes []*change.Change, checkpoint change.Checkpoint) error {
	expectedClientSeq := checkpoint.ClientSeq + 1
	actualFirstClientSeq := changes[0].ClientSeq()

	if expectedClientSeq < actualFirstClientSeq {
		return fmt.Errorf(
			"ClientSeq is not sequential with DocInfo.Checkpoint.ClientSeq (expected: %d, actual: %d) : %w",
			expectedClientSeq,
			actualFirstClientSeq,
			ErrClientSeqNotSequentialWithCheckpoint,
		)
	}
	return nil
}

and here is full validation codes

func validateClientSeqSequential(changes []*change.Change, checkpoint change.Checkpoint) error {
	if len(changes) < 1 {
		return nil
	}

	if err := validateClientSeqSequentialWithCheckpoint(changes, checkpoint); err != nil {
		return err
	}

	return validateClientSeqInChangesAreSequential(changes)
}

func validateClientSeqSequentialWithCheckpoint(changes []*change.Change, checkpoint change.Checkpoint) error {
	expectedClientSeq := checkpoint.ClientSeq + 1
	actualFirstClientSeq := changes[0].ClientSeq()

	if expectedClientSeq < actualFirstClientSeq {
		return fmt.Errorf(
			"ClientSeq is not sequential with DocInfo.Checkpoint.ClientSeq (expected: %d, actual: %d) : %w",
			expectedClientSeq,
			actualFirstClientSeq,
			ErrClientSeqNotSequentialWithCheckpoint,
		)
	}
	return nil
}

func validateClientSeqInChangesAreSequential(changes []*change.Change) error {
	nextClientSeq := changes[0].ClientSeq()
	for _, cn := range changes[1:] {
		nextClientSeq++

		if nextClientSeq != cn.ClientSeq() {
			return fmt.Errorf(
				"ClientSeq in Changes are not sequential (expected: %d, actual: %d) : %w",
				nextClientSeq,
				cn.ClientSeq(),
				ErrClientSeqInChangesAreNotSequential,
			)
		}
	}
	return nil
}

2. Seperate `ErrClientSeqNotSequential` to two error case

ErrClientSeqNotSequentialWithCheckpoint
ErrClientSeqInChangesAreNotSequential

3. And Write Additional Test Codes

// packs/packs_test.go

func RunPushPullWithNotSequentialClientSeqWithCheckpoint(t *testing.T) {
	ctx := context.Background()
	be := setUpBackend(t)
	project, _ := be.DB.FindProjectInfoByID(
		ctx,
		database.DefaultProjectID,
	)

	clientInfo, _ := clients.Activate(ctx, be.DB, project.ToProject(), clientID)

	actorID, _ := time.ActorIDFromHex(clientID)
	changePackFixture, _ :=
		createChangePackFixture(helper.TestDocKey(t).String(), actorID.Bytes())

	docInfo, _ := documents.FindDocInfoByKeyAndOwner(
		ctx, be, clientInfo, changePackFixture.DocumentKey, true)
	err := clientInfo.AttachDocument(docInfo.ID, changePackFixture.IsAttached())
	if err != nil {
		assert.Fail(t, "failed to attach document")
	}

	_, err = packs.PushPull(ctx, be, project.ToProject(),
		clientInfo, docInfo, changePackFixture, packs.PushPullOptions{
			Mode:   types.SyncModePushPull,
			Status: document.StatusAttached,
		})
	if err != nil {
		assert.Fail(t, "failed to push pull")
	}

	changePackWithNotSequentialClientSeqWithCheckpoint, _ :=
		createChangePackWithNotSequentialClientSeqWithCheckpoint(helper.TestDocKey(t).String(), actorID.Bytes())
	_, err = packs.PushPull(ctx, be, project.ToProject(), clientInfo, docInfo,
		changePackWithNotSequentialClientSeqWithCheckpoint, packs.PushPullOptions{
			Mode:   types.SyncModePushPull,
			Status: document.StatusAttached,
		})
	assert.Equal(t, connecthelper.CodeOf(packs.ErrClientSeqNotSequentialWithCheckpoint), connecthelper.CodeOf(err))
}

4. And Remove the TODO comments

i remove the TODO comments into packs/packs.go, function PushPull

// TODO: Changes may be reordered or missing during communication on the network.
// We should check the change.pack with checkpoint to make sure the changes are in the correct order.

…n the case of `ErrClientSeqNotSequentialWithCheckpoint`.

coderabbitai

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between a32137e and 0217951.

Files selected for processing (1)

server/rpc/connecthelper/status.go (3 hunks)

Files skipped from review as they are similar to previous changes (1)

server/rpc/connecthelper/status.go

binary-ho · 2024-08-12T14:32:35Z

I changed it to return the CodeFailedPrecondition status code to the client in the case of ErrClientSeqNotSequentialWithCheckpoint.

Why?

@sejongk asked me whether it is appropriate for both ErrClientSeqNotSequentialWithCheckpoint and ErrClientSeqInChangesAreNotSequential to return InvalidArgument.

so, I read the gRPC Core - Status codes and their use in gRPC documentation.

and Here is an excerpt from the document:

INVALID_ARGUMENT: The client specified an invalid argument.
Note that this differs from FAILED_PRECONDITION. INVALID_ARGUMENT indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name).
FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution.
For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc.

The difference lies in whether or not it is related to the system's status.

After reading the document,
I concluded that it would be more appropriate for the two errors to return different status codes:

ErrClientSeqNotSequentialWithCheckpoint -> FailedPrecondition
ErrClientSeqInChangesAreNotSequential -> InvalidArgument

As I understand it:

INVALID_ARGUMENT is an error that occurs when an argument falls outside the predefined "correct argument" -> regardless of the system's "status."
FailedPrecondition is an error that can occur -> depending on the system's current "status."

The examples provided in the document align with this understanding:

INVALID_ARGUMENT: "The format of the file passed as an argument is incorrect" -> This is independent of the system's current status.
FailedPrecondition: "The rmdir operation cannot be executed because the directory is not empty" -> This is related to the current "status" of the directory. If the directory is empty, no error occurs, but if it is not empty, an error is triggered.

In the case of ErrClientSeqNotSequentialWithCheckpoint, the error occurrence depends on the current state of the ClientSeq in the Checkpoint of ClientInfo. This is related to the system's "status."

On the other hand, the requirement that "ClientSeqs within Changes must be sequential" in ErrClientSeqInChangesAreNotSequential is unrelated to the server system's status. It only compares the arguments. The "correct argument" is defined, and it simply checks if the arguments conform. This is independent of the system's status.

Conclusion

Therefore, I believe it is better for these two situations to return different status codes:

ErrClientSeqNotSequentialWithCheckpoint -> FailedPrecondition
ErrClientSeqInChangesAreNotSequential -> InvalidArgument

taeng0204 and others added 14 commits August 7, 2024 22:12

Add HTTP health check handler for server health monitoring (yorkie-te…

495a54f

…am#952) Added the handler to allow health checks to be performed with plain HTTP GET requests needed for traditional uptime checker or load balancer, along with existing gRPC health check.

Implement validation method to ensure sequential ClientSeq in reqPack…

4344fce

….Changes

Add ErrClientSeqNotSequential to errorToCode Mappers in yorkie.connect

c5abdcd

Write 'packs' test code for the following cases: sequential ClientSeq…

62e7f3d

…, non-sequential ClientSeq, ClientSeq greater than ClientInfo's ClientSeq, and ServerSeq greater than DocInfo's ServerSeq.

Fix linting issues in packs_test

2b58d1d

Write 'packs' test code for the following cases: sequential ClientSeq…

2c48484

…, non-sequential ClientSeq, ClientSeq greater than ClientInfo's ClientSeq, and ServerSeq greater than DocInfo's ServerSeq.

Merge branch 'main' into validate-checkpoint

f41ea88

Fix goimports issues in packs_test

6d13dc9

Fix goimports issues in packs_test

a348192

Rewrite test description

e447471

Fix goimports issues in packs_test.go

59392e0

Fix push/pull test where ClientSeq is less than ClientInfo's ClientSeq

7712bb8

Fix goimports issue at health_test.go

be485a7

Rollback health_test.go

b2472f4

coderabbitai bot reviewed Aug 10, 2024

View reviewed changes

sejongk self-requested a review August 11, 2024 03:23

sejongk requested changes Aug 11, 2024

View reviewed changes

Add Validation ClientSeq is Sequential With DocInfo.Checkpoint.ClientSeq

511f8f1

coderabbitai bot reviewed Aug 11, 2024

View reviewed changes

binary-ho added 4 commits August 11, 2024 20:31

Write Test Code about validate ClientSeq is sequential with DocInfo.C…

1917a7b

…heckpoint.ClientSeq

Fix validate ClientSeq Sequential With Checkpoint

38cc7ed

Remove TODO comment about needs of Changes validate

4871d92

Fix lint issues: some lines too long

a32137e

coderabbitai bot reviewed Aug 11, 2024

View reviewed changes

binary-ho requested a review from sejongk August 11, 2024 13:27

sejongk assigned binary-ho Aug 12, 2024

sejongk requested a review from hackerwins August 12, 2024 13:15

Change to return CodeFailedPrecondition Status Code to the client i…

0217951

…n the case of `ErrClientSeqNotSequentialWithCheckpoint`.

coderabbitai bot reviewed Aug 12, 2024

View reviewed changes

sejongk marked this pull request as draft August 21, 2024 07:34

hackerwins force-pushed the main branch from 8eb4284 to b77c25a Compare September 5, 2024 08:36

binary-ho mentioned this pull request Nov 1, 2024

Add binary-ho to members yorkie-team/community#3

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate Checkpoint in ChangePack for PushPull API requests #959

Validate Checkpoint in ChangePack for PushPull API requests #959

binary-ho commented Aug 10, 2024 •

edited

Loading

CLAassistant commented Aug 10, 2024 •

edited

Loading

coderabbitai bot commented Aug 10, 2024 •

edited

Loading

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

sejongk left a comment •

edited

Loading

coderabbitai bot left a comment

coderabbitai bot left a comment

binary-ho commented Aug 11, 2024 •

edited

Loading

coderabbitai bot left a comment

binary-ho commented Aug 12, 2024

Validate Checkpoint in ChangePack for PushPull API requests #959

Are you sure you want to change the base?

Validate Checkpoint in ChangePack for PushPull API requests #959

Conversation

binary-ho commented Aug 10, 2024 • edited Loading

1. What this PR does / why we need it:

1.1 Case 1: The ClientSeqs in the request sent to the PushPull API are sequential

1.2 `Case 2: The ClientSeq in changes not sequential with DocInfo.Checkpoint.ClientSeq

1.3 Case 3: The ClientSeq in changes cannot be less than the ClientSeq in the Server's ClientInfo

1.4 Case 4: The ServerSeq in the request cannot be greater than the ServerSeq maintained by the server.

1.5 Test Code

2. Which issue(s) this PR fixes:

3. Special notes for your reviewer:

4. Does this PR introduce a user-facing change?:

5. Additional documentation:

5.1 My Question And Answer

Q1. Where do the server's ClientSeq and ServerSeq come from?

Q2. Where do the ClientSeq and ServerSeq sent by the client come from?

Answer 1, 2

Q3. How is a "wrong" ClientSeq determined?

Answer 3

Q4. How is a duplicate request determined?

Answer 4

Q5. How is a wrong ServerSeq determined?

Answer 5

Q6. Where should the validation take place?

Answer 6

5.2 sejongk's Answer

6. Checklist:

Summary by CodeRabbit

CLAassistant commented Aug 10, 2024 • edited Loading

coderabbitai bot commented Aug 10, 2024 • edited Loading

Walkthrough

Changes

Assessment against linked issues

Poem

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

sejongk left a comment • edited Loading

Choose a reason for hiding this comment

Answers to your questions

1. Case 1: The function validateClientSeqSequential() verifies that the Changes in the request are sequential. It does not validate whether the first Change's ClientSeq differs by 1 from the ClientSeq in the existing ClientInfo. Should this be added?

2. Case 1: Should we remove the TODO comment in the PushPull function in server/packs/packs.go?

3. Case 3: The issue specifies that malicious ServerSeqs should return connect.CodeInvalidArgument, but currently, it returns connect.CodeFailedPrecondition. Is this appropriate?

4. If client actions are required in response to these scenarios, we should discuss how to handle them.

Additional Consideration

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

binary-ho commented Aug 11, 2024 • edited Loading

1. add validation Change's ClientSeq differs by 1 from the ClientSeq in the existing docInfo

2. Seperate ErrClientSeqNotSequential to two error case

3. And Write Additional Test Codes

4. And Remove the TODO comments

coderabbitai bot left a comment

Choose a reason for hiding this comment

binary-ho commented Aug 12, 2024

Why?

The difference lies in whether or not it is related to the system's status.

As I understand it:

The examples provided in the document align with this understanding:

Conclusion

binary-ho commented Aug 10, 2024 •

edited

Loading

1.1 `Case 1`: The ClientSeqs in the request sent to the PushPull API are sequential

1.3 `Case 3`: The ClientSeq in changes cannot be less than the ClientSeq in the Server's ClientInfo

1.4 `Case 4`: The ServerSeq in the request cannot be greater than the ServerSeq maintained by the server.

Q1. Where do the server's `ClientSeq` and `ServerSeq` come from?

Q2. Where do the `ClientSeq` and `ServerSeq` sent by the client come from?

5.2 `sejongk`'s Answer

CLAassistant commented Aug 10, 2024 •

edited

Loading

coderabbitai bot commented Aug 10, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

sejongk left a comment •

edited

Loading

binary-ho commented Aug 11, 2024 •

edited

Loading

1. add validation `Change's ClientSeq differs by 1 from the ClientSeq in the existing docInfo`

2. Seperate `ErrClientSeqNotSequential` to two error case